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Description 

The invention relates to electronic image methods and devices, and. more particularly but not exclusively, todigjtal 
communication and storage systems with compressed images. 

Video communication (television, teleconferencing, Internet, and so forth) typically transmits a stream of video 
frames (pictures, images) along wrth audio over a transmission channel for real time viewing and hstenmg or storage 
However transmission channels frequently add corrupting noise and have limited bandwidth. Consequently d.grta 
video transmission with compression enjoys widespread use. In particular, various standards for compression o digrtal 
video have emerged and include H.261 , MPEG-1 , and MPEG-2, with more to follow, including in development H.263 
w and MPEG-4. There are similar audio compression methods. .^u^,^ 

Tekalp Digital Video Processing (Prentice Hall 1995), Clarke, Digital Compression of Still Images and Video (Ac- 
ademic Press 1995), and Schafer et al, Digital Video Coding Standards and Their Role in Video Commun.cat.ons 83 
Proc IEEE 907 (1995) include summaries of various compression methods, including descnptions of the H.251 
MPEG-1, and MPEG-2 standards plus the H.263 recommendations and indications of the desired functionalities of 

MPE H 261 compression uses interf rame prediction to reduce temporal redundancy and discrete cosine transform (DCT) 
on a block level together with high spatial frequency cutoff to reduce spatial redundancy. H.261 is recommended for 
use with transmission rates in multiples of 64 Kbps (kilobits per second) to 2 Mbps (megabits per second). 

The H 263 recommendation is analogous to H.261 but for bitrates of about 22 Kbps (twisted pair telephone w.re 

20 compatible) and with motion estimation at hall-pixel accuracy (which eliminates the need for loop filtering available ,n 
H 261 ) and overlapped motion compensation to obtain a denser motion field (set of motion vectors) at the expense of 
more computation and adaptive switching between motion compensation with 16 by 16 macroblockand 8 by 8 blocks^ 
MPEG-1 and MPEG-2 also use temporal prediction followed by two dimensional DCT transformation on a block 
level as H 261 but they make further use of various combinations of motion-compensated pred.ct.on. .nterpolatron. 

2S and intraframe coding. MPEG-1 a^s at video CDs and works well at rates about 1-1.5 Mbps for frames of about 360 
Dixels bv 240 lines and 24-30 frames per second. MPEG-1 defines I, P. and B frames with I frames mtraframe. P frames 
coded using motion-compensation prediction from previous I or P frames, and B frames using mot.on<ompensated 
bi-directional prediction/interpolation from adjacent I and P frames. 

MPEG-2 aims at digital television (720 pixels by 4S0 lines) and uses bitrates up to about 10 Mbps wrth MPEG-1 

so type motion compensation with I. P and B frames plus added scalability (a lower bitrate may be extracted to transmit 

3 IO HoweteMhe forgoing MPEG compression methods resutt in a number of unacceptable artifacts such as block- 
iness and unnatural object motion when operated at very-low-bit-rates. Because these techniques use only the stet.s- 
,ical dependencies in the signal at a block level and do not consider the semantic content o the video stream, arti facts 

as are introduced at the block boundaries under very-low-bit-rates (high quantization factors). Usually these b ock bound- 
aries do not correspond to physical boundaries of the moving objects and hence visually annoying artfacts result. 
Unnatural motion arises when the limited bandwidth forces the frame rate to fall below that required for smooth moton 
MPEG-4 is ro apply to transmission bitrates of 10 Kbps to 1 Mbps and is to use a content-based coding approach 
with functionality such as scalability, content-based manipulations, robustness in error prone environments. multi- 

,o media data access tools, improved coding efficiency, ability to encode both graphics and video, and improved random 
access A video coding scheme is considered content scalable if the number and/or quahty of simultaneous objects 
coded can be varied. Object scalability refers to controlling the number of simultaneous objects coded and quality 
scalability refers to controlling the spatial and/or temporal resolutions of the coded objects. Scalab Irty is an important 
SrefavkleocalingmethLope^ 

45 me bandwdth is dynamic. For example, a content-scateble video coder has the ability to optim,ze the performance .n 
the face of limited bandwidth by encoding and transmitting only the .mportant objects n the scene a a h£ ^alrty. . 
can then choose to either drop the remaining objects or code them at a much tower quality. When the bBnMhrf 
me channel Ureases, the coder can then transmit additional bits to improve the quahty of the poorly coded objects 

of Wavelet Coeffic.ents. 41 IEEE Tr.Sig.Proc 3445 (1993) provides a wavelet hierarchical subband decomposrt.on 
which groups wavelet coefficients at diff rent scales and predicts zero coefficients across scales. Th,s P rov ' des a 
tization and fully embedded bitstream in the sense that the bitstream cf a lower bitrate is embedded .n the brtstream 

°' "vntesenoret al. Wavelet PiKer Evaluation for Image Compression. 4 IEEE Tr.lmage Proc. 1053 (1995) discusses 

the wavelet subband decomposition with various mother waveless. 
However, more efficient coding at low bitrates remains a problem. 

Hardware and software implementations of the JPEG. H.261 . MPEG-1 . and MPEG-2 compresston and decod.ng 



so 
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exist Further prgrammable microprocessors or digital signal processors, such as the Ultrasparc orTMS320C6x run- 
ning appropriate software can handle most compression and decoding, and less powerful processors may handle lower 
bitrate compression and decompression. 

Arwllustrative embodiment of the present invention seeks to provide a method for video compression and decod.ng 
5 that avoids or minimizes above mentioned problems. Further and different aspects of the invention are specified m the 
claims 

An embodiment of the present invention provides video compression and decoding with predictive embedded 
zerotree coding applied to a hierarchical decomposition (including wavelet) with a single symbol for s.gnificant coeffi- 
cients olus optionally, an additional symbol for significant coefficients with all zero descendants. 
w This has the advantage of decreasing the bits required for coding with little compensation required in a decoder 
and can be used for JPEG or MPEG I frames or objects. 

A further embodiment of the present invention also provides video systems with applications for this coding, such 
as video telephony and fixed camera surveillance for security, including time-lapse surveillance, with digital storage in 

random access memories. . ... 

is For a better understanding of the present invention to the following description of embodiments of the mvention 
reference will now be made to the following description of embodiments of the invention by way of examples, and to 
the accompanying schematic drawings, in which: 

Figure 1 is a flow diagram for a preferred embodiment encoding; 
zo Figures 2a-c illustrates subband hierarchical decomposition; 

Figures 3a-d show coefficient encoding; 

Figures 4a-b show flow for a dominant pass and the states: 

Figures 5a-c indicate empirical results; 

Figure 6 shows a preferred embodiment telephony system: 
25 Figure 7 illustrates a preferred embodiment surveillance system; and 

Figure 3 is a flow diagram lor a preferred embodiment video compression method. 

Fiqure 1 is a flow diagram of a single frame modified zerotree first preferred embodiment image or frame encoding 
usinq wavelet hierarchical decomposition. The flow diagram will be explained with the help of an example for simpl.crty; 

30 thus presume a frame of 144 rows ol 176 pixels with 3-bit pixels (-123 to + 127) and presume four scale levels ,n a 
wavelet hierarchical decomposition. The value of the pixel at (j.k) may be denoted x(j,k) for 0 S j < 143 and O S k s 75. 

To begin the hierarchical decomposition, first filter the 144 by 176 frame with each of the four filters ttfflWO. h 0 
<rth (M h,(i)h n ('<) and h.QJUWk). to give 144 by 176 tittered frames (boundary pixel values are used to extend the 
K^e^U wE* Otherwise wou« extend beyond the frame). A computationally sjnp.e ho(k) function 

ss equate 1*2 at k=0.l . and is zero for all other k: h,(k) equals 1*2 at teO. -1/ v2 at k= 1 , 1/8v2 at ^.3J^2 aU=- 
1-2 and zero fcr all other k. The V.llasenor article cited earlier lists other filter functions. The filtering is mathematically 
convolution with the functions, so h 0 is a lowpass filter ,n one dimension (averages over two adjacent pocete) and h, 
Lhighpassfilterinonedimens»n(essentialtyadi^^ 

lowoass-lowpass. lowpass-highpass. highpass-lowpass. and highpass-highpass. respectively. 

* Next. subsample each filtered frame by a factor of four by retaking only pixels at (MO with , and k both even 

integers This subsampling will yield four 72 by 88 subarrays of wavelet coefficients, denoted LL1 . LH1 HL , and HH1 
respectively, with coefficient locations (j.k) relabelled for 0 S j < 71 and 0 £ k S 87. This forms tne first level of the 
decomposL. and the four subarrays can be placed together to form a single 144 by 176 array wmch makes visual 
izat*n ol the decomposition simple as illustrated in Figure 2a. Thus LL1 is a tower resolution version of £ cng^l 

45 frame and coulc be used as a compressed version of me original frame. The values of the pixels «n these filtered and 
subsampled images are the first level wavelet coefficients. 

The LL1 LH1 HL1 and HH1 subarrays can be used to reconstruct the original frame by first mterpofating each 
subarray by a (actor ol tour (to restore the 144 by 176 size), then filtering the lour 144 by 176 arrays with filters &(,) 
n-r u nJita <k) o ma«(k) and g,(j)g,(k), respectively, and lastly pixelwise adding these four filtered images together. 

50 # ?,S2E ^ go'and 0 are fo^ald hig'hpass Lers. respective* and refcte to h 0 and h, by grfn) = (-W) 
and oS-MM The fw h v go. and g, functions are symmetric about 1/2. rather than about 0 as would be the 
case ,o! an iddTap "liter so after'e^onstrucUon the pbce. index is shifted by 1 to adjust for the two V2 pixe. shifts dunng 

,W ° Tne'Scond level in the decomposition simply repeats the four filterings with the ^ and hi functions plus subsarrv- 
55 cinq by a factor of four but using the LL1 subarray as the input. Thus the four filtered subarrays are each36 by 44 
and'deno.^ LL2. LH2. HL2, and HH2. As be.oreand as shown in figure 2b. me LL2, LH2, HU jrt HH2 can be 
arraigned to visualize the decomposition of LL1 and also could be used for reconstruction of LL1 wrth the go and g, 
based filters The LH1 . HL1. and HH1 subarrays of first level coefficients remain unfiltered. 
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Repeat this decomposition on LL2 by filtering with the four filters based on ho and h, followed I by subsampling to 
■ obtain LL3. LH3, HL3. and HH3 which are 18 by 22 coefficient subarrays. Again, LL3, LH3, HL3, and MM3 can be 
arranged to visualize the decomposition of LL2. as shown in figure 2c. 

Complete the hierarchical four level decomposition of the original frame by a last filtering with the four filters based 
s on hoand hi followed by subsampling of LL3to obtain LL4, LH4, HL4, and HH4 which are each a 9 rows of 11 coefficients 
subarray Figure 2c illustrates all of the resulting subarrays arranged to form an overall 144 by 17S coefficient array. 
Figure 2c also indicates the tree relation of coefficients in various levels of the decomposition; indeed, a coefficient w 
(j,k) in LH4 is the result of filtering and subsampling of coefficients x(j,k) in LL3: 

10 w(j,k) = h 0 (0)h 1 (0)x(3 1 2k) + h^OJh, (1)X(2j,2k-1) 

+h o (0)h 1 (-1 }x(2j,2k+1 ) + h 0 (0)h, (2)x(2j,2k-2) 
+h o (0)h 1 (-2)x(2j,2k+2) + h 0 (0)h,(3)x(2j,2k-3) 

is 

+h 0 (1)h 1 (0)x(2i- 1,2k) +h 0 (1)h 1 (l)x£M.2k- D 
+ h 0 (1 )h, (-1 )x(2j-1 ,2k+1 ) + h 0 (1 )h, (2)x(2j-1 .2k-2) 
*> + h 0 (1)h 1 (-2)x(2j-1,2k+2) + t\ 0 {1)h,(3)x{2\-1.2*-3) 

Because the filtering plus subsampling is basically computing w(j,k) from the 2 by 2 area in LL3 (the values of hi 
(k) are small except for k=0, 1 ), there are four coefficients (x(2j-1 . 2k-l ). x(2j-1 ,2k). x(2j.2k-1 ) and x(2j.2k)) in LL3 which 

25 determine w{i k) in LH4. Now each of these four coefficients in LL3 is related to a corresponding one of the four coef- 
ficients in LH3 at the same positions ((2j- 1. 2k-1). <2j-1.2k). (2j.2k-1) and (2j,2k)) because they were computed from 
essentially the same 4 by 4 location in LL2. Thus the coefficient wG.k) in LH4 is called the parent of the four related 
coefficients z(2j-1 .2k-1 ), Z<2j-1 ,2k). z(2j,2k-1 ) and z(2j.2k) in LH3 and each of these four coeffioents in LH3 is a child 
of the oarent coefficient in LH4. This terminology extends to LH2 and LH1. The generic term descendant includes 

so children, children of- children, and so forth. See Figure 2c showing descendants. IIA llAA UU1 . . .. 

Using the hierarchical decomposition of the original frame into coefficient subarrays LL4. LH4 HH1, begin the 

modified embedded zerotree encoding: first find the maximum of the magnitudes of the coefficients wij.k) in the coef- 
ficient array (union of subarrays LL4, LH4, HL4, ... LH1, HL1, HH1) except exclude the coefficients in LL4 which will 
be separately encoded. Then pick an initial quantization threshold. T 0 , so that 

35 To < maxlw(j.k)l < 2T 0 l. .„..,„„„, 

The initial threshold value, T„. is encoded (with a variable length code) and made part of the brtstream. 
Embedded zerotree encoding essentially encodes wQ.k) by using a binary expansion of w(j.k)/T 0 with successive 
scans of the coefficient array adding successive bits in the expansion (i.e., bitplanes). This provkles a fully embedded 
bitstream with resolution increasing over an entire reconstructed image on each scan of the array. For example, a 

40 background transmitted over the Internet could be improved with updates between data transmissions. Scan the wave- 
let coefficient array in the order of lowpass to highpass: that is, the subarrays are scanned in the order dlH4. HU 

HL1 HH1 with each subarray raster scanned (a row at a time). Thus the decoder receiving a transmitted bitstream 
can determine wavelet coefficient location in the coefficient array by order in the bitstream. 

First code the baseband LL4 with pulse code modulation (PCM) or differential PCM(DPCM); LL4 has nodescendant 

« coefficients and simple zerotree coding does not have any gain over PCM. Indeed. PCM codes each of the 99 (9 by 
1 1 ) coefficients individually, and successive bit planes (one brt from each of the 99 coefficients) provide successively 
better resolution. Thus PCM can be part of a fully embedded bitstream syntax. DPCM uses fewer b,ts because he 
coefficients are coded as differences from adjacent coefficients, but this disrupts full embedding. LL4 ,s basicaHy a low 
resolution version of the orignial frame (each wavelet coefficient is essentially an average of the pixels in a 16 by 16 

so macrobkxk). thus putting a DPCM coded LL4 near the front of the bitstream is practical. Figure 1 shows the case of 
DPCM coding LL4; for PCM with full embedding, the looping with threshold decrementing would include the code 

^tleTraSscan subarray LH4 and encode each of the 99 wavelet coefficients to indicate which of the following 
four classes w(j.k) falls into: (i) ZTRZ (zerotree root with zero value) if lw(j,k)l < T 0 and all descendants of w(j.k (4 in 
ss LH3 1 6 in LH2 and 64 in LH1 ) also have magnitudes less than T 0 . (ii) ZTRS (zerotree root with significant va ue) \ 
lw(i k)l > T 0 but all descendants of w(j.k) (4 in LH3, 16 in LH2. and 64 in LH1) have magnitudes less than T 0 (hi) IZ 
(isolated zero) if lw(i.k)l < T 0 but at least one of the descendant wavelet coefficients has magnitude not less than T 0 . 
and (iv) IS (isolated significant coefficient) A lw(j.k)l i T 0 and at least one of the descendant wavelet coefficients has 
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magnitude not less than T 0 . The 99 coefficients in LH4 are raster scanned, and the encoding generates 198 bits if 
each coefficient takes two bits or, preferably, generates fewer bits if an adaptive arithmetic coding is used. Note that 
if a wavelet coefficient in LH4 is encoded as ZTRZ or ZTRS. then the decoder determines that all of the descendant 
wavelet coefficients have magnitude less than T 0 and so these coefficients need not be coded in this scan of the 
s coefficient array because they are already known to be insignificant. Figure 3a illustrates the range of coefficient values 
and codes associated with them. Also, start a list of significant coefficients and append w(j,k) to the list if w(j,k) is coded 
as IS plus replace it with symbol X in the LH4 subarray. On successive scans of the coefficient array additional bits 
from the binary expansion of these coefficients on the list of significant coefficient will be coded. The decoder recon- 
structs the coefficients from the successive bits by knowing the scan order and the array locations from the initial ZTRS 
io and IS codes. Also, on successive scans the threshold is divided by powers of 2, so the wavelet coefficients initially 
encoded as ZTRZ or IZ may be further resolved as ZTRS or IS and appended to the list. Also, after coding symbols 
ZTRS or IS, one additional sign bit is also sent. 

X evaluates as a 0 for threshold comparisons (i.e.. as in checking to see whether a parent coefficient is a ZTRZ 
or ZTRS) but is skipped rather than coded as a 0 on successive scans. This use of symbol X rather than a 0 as the 
is replacement for a removed significant coefficient implies fewer coefficients may be needed for coding on subsequent 
scans of the array. There is a tradeoff between (1) using the X symbol as the replacement for significant coefficients 
which codes fewer bits by skipping the X symbols on successive scans and (2) using a 0 as the replacement for 
significant coefficients which will save bits when the 0 is found to be a zerotree root. 

Continue with the same coding (ZTRZ, ZTRS. IZ, or IS) for the raster scan of HL4 and then of HH4 along with the 
20 appending to the list of significant coefficient plus replacing by X of coefficients coded as ZTRS or IS. 

After completing the fourth level coefficient subarrays, continue the scanning and encoding for the third level sub- 
arrays LH3, HL3, and HH3. In scans of these subarray a wavlet coefficient which has a parent coded as ZTRZ or ZTRS 
is just skipped- the decoder knows the locations of all descendants of a ZTRZ or ZTRS. 

Similarly, scan and encode the remaining subarrays in order LH2. HL2. HH2, LH1. HL1, and HHl along with the 
25 appending to the significant coefficient list and replacement by symbol X lor the coefficients coded as IS. Figure 4a is 
a flow diagram of the entire array scan where the "increment scan variable" means raster scan of the subarrays in 
order The array scan and coding is termed a dominant pass through the array. 

The decoder may reconstruct a frame from the coefficient codes by using a values of ±3To/2 for coefficients coded 
ZTRS or IS and a value cf 0 for coefficients coded ZTRZ or IZ. This encoding essentially is a map of the location (and 
50 sign) of significant coefficients (greater than threshold). 

Next encode the members of the list of significant coefficients in a subordinate pass in which each member has 
one more bit coded as follows: if wfj.k) was previously coded as significant and positive (which means T 0 s w(j.k) < 
2T a ) then code a 0 for T 0 < w(j,k) < 3T 0 IZ and a 1 for 3V2 < w(j,k) < 2T 0 . Similarly for w(j.k) coded as significant and 
negative code a 0 for -2T 0 < w(j,k) < -3^2 and a i for -3ty2 < w(j,k) < -T 0 . Note these are just the second most 
as significant bits in the binary expansion of w(j,kyT 0 : ZTRS, IS, and a sign bit would be the sign and most significant bits 
(with negative numbers in two's complement format). Figure 3b heuristically illustrates the range of codes. 

After completing the foregoing scan and ZTRZ-ZTRS-IZ-IS-skip coding (a dominant pass through the coefficient 
array) Plus the additional bit for the members of the significant coefficient list (subordinate pass), replace T 0 by T, - 
V2 and repeat the dominant pass and subordinate pass with T, as the threshold. Figure 3c illustrates the coefficient 
40 ranges for the dominant pass, and Figure 3d illustrates the subordinate pass. The dominant pass typically appends 
more coefficients to the list of significant coefficients plus replaces them with Xs, and the subordinate pass adds an 
additional bit of resolution for each members of the list. During the dominant pass, the X value of a coefficient in the 
array is treated as a 0 for threshold comparisons but is skipped rather than being encoded a ZTRZ or IZ. On the average 
this use of X decreases the number of bits that need to be transmitted: see Figures 5a-c illustrating exper.mental results 
45 of the gain using X in connection with the preferred embodiment. 

This successive decreases in the quantization threshold provides increasingly higher resolution reconstructions 
of the original frame. Further, if the initial threshold. T 0 . may be selected for computational convenience provided ,n 
lies the range ol half the coefficient maximum to the coefficient maximum. 

In the original embedded zerotree algoriths of Shapiro, the wavelet coefficients are coded in several passes. Each 
so pass encodes one bitplane. The positions of the coefficients that become significant with respect to the new threshold 
are encoded efficiently with zerotrees. in which each node of the tree represents the significance of the coefficient at 
the node and the coefficients in the substree rooted at the current node (one could consider a zerctree as essentially 
a significance map). The original embedded zerotree algorithm use the lollowing symbols to represent the .significance 
of the nodes- ZTR IZ POS, NEG. ZTR represents a node where the coefficieent itself is zero and all its descendants 
'5 are zero IZ represents a node where the coeficient itself is zeroand not alloof its descendants are zero. POS represents 
a node where the coeficient itself is positive, and NEG represnts a node where the coefficient itself is negative. It can 
be shown that statistically the sero wavelet coeficients tend to cluster in the same spatial location, and the condrtwnal 
probability fcr zero coefficient is much higher given the parent of the coefficient being zero. This explains why the 
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zerotree quantization reduces the overhead for the significance map and provides good coding efficiency. 

The preferred embodiment improves coding efficiency with a different set of symbols: ZTRZ, ZTRS, IZ, IS. ZTRZ 
represents the node where the coefficient itself is zero as well as all of its descendants. ZTRS represents the node 
where the coefficient itself is nonzero, but alll of its descendants are-zero. IZ is a node where the coefficient is zero 

s but not all of its descendants are zero; and IS represents a significant coefficient whose descendants are not all zero. 

In comparison with the original symbols, the preferred embodiment replaces POS and NEG symbols with one 
symbol IS because the possibility for positive numbers and negative numbers are about equal. By using one symbol, 
this reduces the number of symbols used to reduce complexity and increase accuracy for probabilitgy estimation. 
In addition, the preferred embodiment introduces the ZTRS symbol to make significant coeffficients permissible 

w as the root of the zerotree. This addition can be justified with some theoretical analysis. For a given random signal 
generated by a autoregressive process, the frequency spectrum of the process is a decaying function with respect to 
frequency. The rate of decay must be faster than 1/f. Their wavelet coefficients also decay with respect to scale at a 
similar rate It has been shown that even for the 1 1\ signals which exhibit infinite energy, wavelet coefficients also decay 
with respect to scales. Since in zerotree quantization, at each pass, the threshold divides in two, the possibility that a 

?s parent node is nonzero while all its descendant are zeros is significant. By introducing the symbol ZTRS, the preferred 
embedment can efficiently represent all the zero descendants of a nonzero root node. Note that for the onginal em- 
bedded zerotree algorith, one needs to first send POS or NEG for the significant coefficient and then send four ZTR 
synmbols to indicate all descendants are zeros. Simulation results also confirm the improvement using the new symbol 
set. 

20 Context modeling and forgetting factors for arithmetic coding is discussed in the following paragraphs. 

Fixed point arithmetic coding is used to entropy code the zerotree symbols. Arithmetic coding is known to be able 
to optimally encode a stationay random sequnce if the statistics of the random signal can be estimated accurately In 
practice, arithmetic coding can provide very good coding performance for small symbol sets : which is the case in the 
preferred embodiment. 

ss The statistics are estimated with accumulative frequencies. Forgetting factors are used to adjust the adaptation 

window size of the frequency estimation. The forgetting factor allows the arithmetic codes to adjust to the local statis- 
tically However, too small an adaptation window will fluctuate the statistics too frequently, which in turn degrades the 
performance. In the preferred embodiment, choose the forgetting factor to be 127, which empirically gives the best 
results. 

30 Most importantly, the preferred embbodiment uses context modeling to better estimate the probability distribution 

of the symbols. The context is determined by two factors: 

(1 ) the state of the coefficient in the previous pass (bitpfane). and 

(2) the subband that the coefficient is in. Simulations show that the probability distribution of the current symbol is 
ss highly conditioned on the probability distribution of its state in the previous pass. For instance, if a coefficient is a 

descendant of a zero zerotree root (ZTRZ) in the previous pass. Then its probability of being zero in the current 
pass is significantly higher than in the case where it is the descendant of a significant zerotree root (ZTRS). Figure 
4b illustrates the state transition graph for a coefficient from a previous pass to the current pass. The additional 
symbols DZ. DS are for internal use only, where DZ refers to a descendant of a ZTRZ syumbol. DS refers to the 
40 descendant of a ZTRS symbol. 

The probability distributions for the various subbands are also quite different For instance, for the highest subband. 
there will be no zerotree roots. When initializing tghe frequency count for the highest subband, set the frequency count 
for ZTRZ ZTRS to be zero, because they will not sppear in that subband. 
as For the subordinate pass, the probability for 1 or 0 is about equal. Therefore, no entropy coding could be used for 

the subordinate pass. In the preferred embodiment, use arithmetic coding to gain a bit more efficiency. The f requencyh 
count is initilized to be freq_sub = [1 .1] which represent frequncy count for 1 and 0. respectively. 

Experimental results are discussed in the following paragraphs. 

Figures 5a-c illustrates the improvement in PSNR at various transmission bitrates from using the preferred em- 
so bodiment predictive zerotree (using new symbols) over the standard zerotree with the baseband separately DPCM 
encoded and with the following 9-3 tap Oaubechies filter functions for the hierarchical decomposition: 

h 0 = [0.033U563036812, -0.06629126073624, -0.17677669529665. 
55 0.41964465132952. 0.99436891104360. 0.41984465132952, 

-0.17677669529665, -0.066291 26073624, 0.0331456X36278] 
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h 1 = [0.35355339059327, 0.707106731 18655, 0.35355339059327] 

The overall bitstream thus has an initial block of bits fully encoding LL4, then a block of bits encoding significant 
5 coefficient location using initial quantization thresholds, then a block of bits adding one bit accuracy to each of the 
significant pixels, then a block of bits encoding newly-significant coefficient location using refined quantization thresh- 
olds, then a block of bits adding one bit accuracy to each of the significant coefficients (both with initial quantization 
thresholds and refined thresholds), and so forth unitl target quantization refinement or other bitstream size or bitrate 
constraint occurs. 

10 A separate threshold preferred embodiment is now described. The three sets of subarrays (LH4, LH3, LH2 t plus 

LH1; HL4. HL3, HL2, plus HL1 ; and HH4, HH3, HH2, plus HH1) could each have its own initial threshold determined 
by the maximum coefficient magnitude in that set of subarrays. 

A Baseband zerotree preferred embodiment is now discribed. Rather than separately coding the baseband, a 
zerotree type coding can be used as follows. Raster scan LL4 and encode each of the 99 wavelet coefficients wQ.k) 

is to indicate which of the following four classes wQ.k) falls into: ZTRZ, ZTRS, IZ, IS: ZTRZ and ZTRS mean lw(j,k)l < T 0 
and the three coefficients at the analogous location in LH4, HL4. and HH4 is each a zerotree root (this allows these 
three ZTRs to be skipped in the scanning of LH4, HL4, and HH4). LL4 differs from the remaining subarrays because 
LL4 wavelet coefficients have no descendants : but in a dark background, both highpass and lowpass coefficients will 
be small, so a ZTRZ or ZTRS in LL4 may provide a coding gain. Also, start the list of significant coefficients with the 

20 scan of LL4, and append w(j,k) to the list if w(j,k) is coded as either ZTRS or IS plus replace IS with X in the LL4 
subarray for successive scans. 

Three dimensional zerotree coding, as could be used for a sequence of video frames treated as a single three- 
dimensional image, preferred embodiments follow the same approach but with pixels x(i.j.k) filtered by eight filters 
rather than four: h 0 (i)h 0 (j)h 0 (k), hofOhoO)^ (k) h^Oh^n^k). to yield a hierarchical decomposition such as LLL4, 

2S LLH4, LHL4, HLL4, LHH4, HLH4, HHL4, HHH4, LLH3 HHH1 for four levels. Again, the baseband LLL4 may be 

separately coded with DPCM, PCM, or other technique, and the preferred embodiment modified zerotree approach 
applied to the remaining subarrays. The scan again is in the order of this decomposition, and each subarray of wavelet 
coefficients is again scanned by looping one variable at a time. 

30 System preferred embodiments 

Figure 6 illustrates in block diagram a preferred embodiment video-telephony (teleconferencing) system which 
transmits both soeech and an image cf the speaker using one of the foregoing preferred embodiment modified zerotree 
image compressions (either as individual images or as I frames in an MPEG type video compression), encoding, 

ss decoding, and decompression including error correction with the encoding and decoding. Of course, Figure 6 shows 
only transmission in one direction and to only one receiver: in practice a second camera and second recewer would 
be used for transmission in the opposite direction and a third or more receivers and transmitters could be connected 
into the system. The video and speech are separately compressed and the allocation of transmission channel bandwidth 
between video and speech may be dynamically adjusted depending upon the situation. The costs of telephone network 

40 bandwidth demand a low-bit-rate transmission. Indeed, very-low»b it-rate video compression finds use in multimedia 
applications where visual quality may be compromised. 

Figure 7 shows a first preferred embodiment surveillance system, generally denoted by reference numeral 200. 
as comprising one or more fixed video cameras 202 focussed on stationary background 204 (with occasional moving 
objects 206 passing in the field of view) plus video compressor 208 together with remote storage 210 plus decoder 

■js and display 220. Compressor 208 provides compression of the stream of video images of the scene (for example, 30 
frames a second with each frame 1 44 by 1 76 8-bit monochrome pixels) so that the data transmission rate from com- 
pressor 208 to storage 210 may be very low, for example 22 Kbits per second, while retaining high quafity images. 
System 200 relies on the stationary background and only encodes moving objects (which appear as regions in the 
frames which move relative to the background) with predictive motcn to achieve the low data rate. This low data rate 

so enables simple transmission channels from cameras to monitors and random access memory storage such as magnetic 
hard disk drives available for personal computers. Indeed, a single telephone line with a modem may transmit the 
compressed video image stream to a remote monitor. Further, storage of the video image stream for a time interval, 
sue as a day or week as required by the particular surveillance situation, will require much less memory after such 
compression. 

55 Video camera 202 may be a CCD camera with an in-camera analog-tc-digital convener so that the output to 

compressor 20£ is a sequence of digital frames as generally illustrated in Figure 5; alternatively, analog cameras with 
additional hardware may be used to generate the digital video stream of frames. Compressor 208 may be hardwired 
or, more conveniently, a digital signal processor (DSP) with the compression steps stored in onboard memory, RAM 
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or ROM or both. For example, a TMS320C5xx or TMS320C6x type DSP may suffice. Also, for a te^erenor* 
system as shown in Figure 6. error correction with real time reception may be .ncluded and implemented on general 

PU ^r P eTs e h S ow?ahigh.eve. fl owdiagramforthe preferred embodiment video comp^ 

the Sowing steps for an input consisting of a sequence of frames. Fo F1. F2 wrth each frame 144 rows o 176 

pixels SS rows of 352 pixels and with a frame rate of 10 frames per second. Details of the steps appear >n the 

^t^mSSL two sizes partition into arrays of 9 rows of 11 macrobtocks wrth each macroblock being 16 pixels 
bv I6*e" or 8 rows of 22 macroblocks. The frames will be encoded as I pictures or P pictures; B pictures wrth the r 
ZSSSZSS^ would create overfy large time delays for very low bitrate transmission. An I picture occurs only 
S^^STSS*. and the majority of frames are P pictures. For the 144 rows of 176 port. «• frames 
rouohly Ml P.c tun £ be encoded with 20 Kbits and a P picture with 2 Kbits, so the overall bitrate wl. be rough* 2 
Zson; Sftmespersecond or less). The frames may be monochrome or color w 
frame (Y signal) plus one quarter resolution (subsampled) color combination frames (U and V s.gnals). 

( 1 j initially encode the zeroth frame Fo as an . picture as in MPEG using a preferred embodiment based on wavelet 
and modified zerotree coding. Compute me mum-level decompose of the frame; opnonalfy separate 
tnTbaseband and encode R with PCM or DPCM (PCM provides simp.e full embbedd-ng), for <-^»m 
sets of Higher bands (HH1 . HH2, ... HHk: HL1 , HL2. ... HLk; and LH1 , LH2, ... LHk) separa tely ™*£ 
ment modified zerotree encode the wavelet coefficients; and transmit in scan hne order w,th the ™ <*£*£ 
Z leaded for full embedding. Other frames will also be encoded as I frames with the P-°P«*^™£ de * 
pendent upon the transmission channel bitrate. If Fn is to be an I p«ture, encode ,n the same manner as Fo, 
S Kr frame Fn to be a P picture, detect moving objects in the frame by findmg the reg.ons of change from 
22S Fn-1 «o Fn. Reconstructed Fn-1 is the approximation to F'n-1 wh*h is actual* 
scribed below Note that the regions of change need not be partitioned into moving ob,ects plus uncovered back- 
ZSZZSS* only approximately describe the movingob,ects. However, this approximation surfices and proves 
more efficient tow coding. Of course, an alternative wouto be to also make this partition nto movng ob,ects plus 
Tclerrblckground trough mechanisms such as .nverse motion vectors to detente a -g«n n^ps to 
ZZ of the change region in the orev»us frame and thus is uncovered background, edge detect™ to determine 
Z clc. or presumption of object characteristics (mode. S ) to distinguish the object from background. 
3) Fo etch connected component of the regk>ns of change from step (2), code its boundary contour, .nclud ng 
inv inteS hoL Thus the boundaries of moving objects are not exactly coded: rather, the boundanes of enfre 
regions 0 rchage are coded and approximate the boundaries of the moving objects. The boundary co*ngm« 
be TmeX TspHnes approximating the boundary or by a binary mask inditing blocks withm the W**** 
Tne spHne provides more accurate representation of the boundary, but the binary mask uses a smaller numbe of 
2s N^thauhe connected components of the regions o. change may be determined by a raster scanmn , of ^the 
bTnar, image mask and sorting pixels in the mask into groups, whk* may merge, accordmg to the sorting of 
aSnSl final groups of pixels are the connected components (connected regions). For example of a 
^gram P s r e Ba^rd et a? Computed Vision (Prentice Hal.) at pages 149-152. For convenience m the foltowtng 
Iha connected components (connected regions) may be referred to as (moving) objects. 
4 Remo^em^lnLcies in the video sequence by motion estimation of the objects frorr , the , previa 
rame 7n particuter match a 16 by 16 block in an object in the current frame FN with the 16 by 16 block in the 
2me .IS n me pTied.g reconstructed frame Fn-1 plus translations of this b.ock up to 1 5 p«e.s ,n all direc- 
ts S best matcE defines^ motion vector for this b.ock. and an approximator , F"n to 
Tn be synthesized from the preceding frame Fn-1 by using the motion vectors wnh the.r corresponds blocks of 

,51 Ster me Amotion of objects to synthesize an approximation Pn. mere may still be areas within the frame 
Si^rconinTs gniHcant amount of residua. informal, such as for fas, changing areas. That .. the >= £ 
Terence between^ Fn and the synthesized apP roxirr«tic« Fn have rrK*ion segmentate apphed anatogous to me 
steps (2)-(3) to define the motion failure regions which contain significant information 

(1 Encode me motion failure regions from step (5) using a waveform codmg technique based or the OCT or 
wavete^ansform. For the OCT case, tile me regions with 16 by 16 macroblock* apply me OCT on 8 by 8 blocks 
cTmc ^ macroblocks. quantize and encode (run.engm and men Huffman coding). For the wavelet case, set aH p«e. 
^ues outside the regions to a constant (e.g.. zero). app.y the multi-level ^^ZZZT a ^ 
(zerotree and men arrthmetic coding) only those wavelet coefficient correspond^ to the "^J""* 
7) Assemble me encoded information for I pictures (OCT or wavelet data) and P pictures (objec tsorde rec wrth 
each object having contour, motion vectors, and motton failure data).' These can be codewords from a table of 
Huffman codes mis is not a dynamic table but rather generated expenmentally. 



8 



EP C892S57A1_I_> 



EP 0 892 557 A1 



(8) Insert ^synchronization words at the beginning of each I picture data, each P picture, each contour data, each 
motion vector data, and each motion failure data. These ^synchronization words are unique in that they do not 
appear in the Huffman codeword table and thus can be unambiguously determined. 

(9) Encode the resulting bitstream from step (8) with Reed-Solomon codes together with interleaving. Then transmit 

s or store. . . 

(10) Decode a received encoded bitstream by Reed-Solomon plus deinterleaving. The ^synchronization words 
help after decoding failure and also provide access points for random access. Further, the decoding may be with 
shortened Reed -Solomon decoders on either side of the deinterleaver plus feedback from the second decoder to 
the first decoder (a stored copy of the decoder input) for enhanced of error correction. 

10 (n) Additional functionalities such as object scalability (selective encoding/decoding of objects in the sequence) 

and quality scalability (selective enhancement of the quality of the objects) which result in a scalable bitstream are 
also supported. 

A Web Server and Browsing Application will now be described. The most annoying problem with net-surfing today 
rs is the delay caused by limited bandwidth, with a lot of them caused by heavy use of graphics and images in the web 
pages. This problem is not likely to alleviate soon because any increase in bandwidth is likely be offset with even more 
wide-spread use of large size images and graphics by content providers. 

A good way of accelerating the downloading of images and graphics is the use of highly scalable image codec. 
The original image could be coded at high bitrate and fine resolution, and be stored on the web server. Then the server 
20 can provide different versions of the image to different users according to their respective bandwidths, or the images 
can be progressively transmitted to the end user while they are doing other jobs. A good, efficient scalable image coder 
is hence essential in this scenario. 

The widely used JPEG coding standard indeed has a scalable profile that can provide a certain level of scalability. 
However, the limited scalability comes with a loss in coding efficiency. In addition, no spatial resolution scalability are 
25 supported. In light of this, the MPEG4 texture codec as well as the preferred embodiment predicitive embedded zerotree 
algorithm is a natural fit for web server and browsing applications. 

The web servers and browsers incorporating the scalable image codec work as follows: 
The images are stored on the web server coded with high fidelity, i.e., high resolution, high bitrate, etc. 
When a user requests downloading of the images, the server selects the bitrate according to the bandwidth avail- 
so able and selects the spatial resolution according to the user's preference and bandwidth. Therefore, a user who are 
using a high bandwidth connection will automatically receive a high fidelity version of the coded image, while a user 
using a 28 8k modem will receive a lower fidelity version. 

The browser will decode the first M bits received and display the image with minimal latency. Note M could be any 
number here so the latency is controllable. The image will be refined as more bits coming in. 
3S The above procedure will dramatically reduce the latency in web browsing caused by downloading graphics and 
images. 

A Dynamic Rate Shaping Application will now be described. In network communications, end-to-end data trans- 
mission is accomplished by relaying the data packets along a path of network routers. The bandwidth of the end-to- 
end connection depends on the bandwidth of all the hops in the data path. On a network where quality of service (QoS) 

40 is not guaranteed such as the TCP/IP network, the bandwidth of the connection tends to fluctuate a let. In a real-time 
communication application, it is therefore important for the network routers to be able to scale bitrate of the data trans- 
missions dynamically. With the preferred embdiment algorithm, the bitstream is embedded and scalable up to brt-level 
precisions The network routers could stmply discard any packets that are of a lower priority (bit-plane wise or scale 
wise) and the end-user could still decode the received bitstream. The rate adjustment can be done by the routers in 

45 the middle of the path without requesting retransmission. 

A Texture Mapping Application will now be described. Texture mapping gives realism to computer generated images 
and speeds up the rendering of a scene. Efficient texture mapping is becoming increasingly important for graphics 
applications Mapping compressed images directly saves the on-board memory and enables the mapping ol large 
images. Tallisman is a recent architecture that takes use of JPEG-based compression for texture mapping. The pre- 

so ferred embodiment algorithm can also be applied here to achieve efficient texture mapping. 

The Mip texture mapping technique utilizes multiresolution representation of a image to reduce computation in 
texture mapping. In traditional Mip mapping, a pyramid of images of various resolutions is generated and stored, which 
can take up to 1 1/3 storage space of the original image. With the preferred embodiment predictive embedded zerotree 
algorrthm the image oyramid is generated by the wavelet transform without oversampling (the number ol wavelet 

55 coefficients is the same as the number of image pixels), and the codec further compresses the image significant* In 
addition to the syntax described above, we will have a lookup table for each 64xS4 blocks similar to the chunking 
technology adopted in Tallisman. The graphic hardware can choose the resolution and quality (bit-plane) according to 
the view point and decode the needed blocks This technology enables storing of large size images on the graphic 
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board as well as flexibility in choosing the resolution and quality of image to be mapped. These features are even more 
important when the scene images are fed real-time through networks like in the internet virtual reality of gaming appli- 

CaU °The preferred embodiments may be varied in many ways while retaining one or more of their features of zerotree 

s coding with a wildcard symbol used for replacement of significant coefficients. 

For example, the size of the images or frames, the number of decomposition levels, the initial thresholds, quanti- 
zation levels, symbols, and so forth can be changed. Generally, subband filtering of other types such as QMF and 
Johnston could be used in place of the wavelet filtering provided that the regions-interest based approach is main- 
tained. Images (data structures) with one or four or more dimensions can analogously be encoded by subband de- 

10 composition and modified zerotree coding applied.. 

In view of the foregoing description it will be evident to a person skilled in the art that various modifications may 
be made within the scope of the invention. 

The scope of the present disclosure includes any novel feature or combination of features disclosed therein either 
explicitly or implicitly or any generalisation thereof irrespective of .whether or not it relates to the claimed invention or 

15 mitigates any or all of the problems addressed by the present invention. The applicant hereby gives notice that new 
claims may be formulated to such features during the prosecution of this application or of any such further application 
derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be com- 
bined with those of the independent claims in any appropriate manner and not merely in the specific combinations 
enumerated in the claims. 
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Claims 



1. A bitstream structure, comprising: 

25 (a) symbols for a subband filtered image including a first symbol representing a significant pixel value with 

all descendant pixels with insignificant values and a second symbol representing an insignificant pixel value with 
ail descendant pixels with insignificant values. 

2. A method of encoding an image, comprising the steps of: 

30 

(a) decomposing an image into subarrays of coefficients by lowpass and highpass filtering; 

(b) encoding the subarrays with a zerotree coding including a first symbol for significant coefficients with all 
insignificant descendants and a second symbol for insignificant coefficients with all insignificant descendants. 

35 3. A method of decoding an encoded image, comprising the steps of: 

(a) interpreting a first symbol as a significant value but with all insiginificant descendant values; and 

(b) interpreting a second symbol as an insignificant value and with all insiginificant descendant values. 
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